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ABSTRACT 


It is a well-known and intensively studied phe- 
nomenon that the levels of many miRNAs are differ- 
entiated in cancer. miRNA biogenesis and functional 
expression are complex processes orchestrated by 
many proteins cumulatively called miRNA biogene- 
sis proteins. To characterize cancer somatic muta- 
tions in the miRNA biogenesis genes and investigate 
their potential impact on the levels of miRNAs, we 
analyzed whole-exome sequencing datasets of over 
10 000 cancer/normal sample pairs deposited within 
the TCGA repository. We identified and characterized 
over 3600 somatic mutations in 29 miRNA biogenesis 
genes and showed that some of the genes are over- 
mutated in specific cancers and/or have recurrent 
hotspot mutations (e.g. SMAD4 in PAAD, COAD and 
READ; DICER1 in UCEC; PRKRA in OV and LIN28B 
in SKCM). We identified a list of miRNAs whose level 
is affected by particular types of mutations in either 
SMAD4, SMAD2 or DICER1 and showed that hotspot 
mutations in the RNase domains in DICER1 not only 
decrease the level of 5p-miRNAs but also increase 
the level of 3p-miRNAs, including many well-known 
cancer-related miRNAs. We also showed an associ- 
ation of the mutations with patient survival. Eventu- 
ally, we created an atlas/compendium of miRNA bio- 
genesis alterations providing a useful resource for 
different aspects of biomedical research. 


INTRODUCTION 


Since the first reports of microRNA (miRNA) contribu- 
tions to B-cell chronic lymphocytic leukemia (1), we have 
observed a substantial increase in reports describing the 
role of these small regulatory RNA molecules in different 


human diseases, including cancers (summarized in (2)). It 
was suggested that miRNAs are globally downregulated in 
cancer (3) and that upregulation or downregulation of cer- 
tain miRNAs acting either as oncogenes or tumor suppres- 
sors may contribute to cancer development and progression 
(4,5). Numerous miRNA profiling studies have led to the 
identification of many miRNAs specifically altered in dif- 
ferent types or subtypes of cancer. Many of these miRNAs 
play an important role in carcinogenesis and the regulation 
of different cancer-related processes, such as cell growth and 
differentiation, cell migration, apoptosis, and epithelial-to- 
mesenchymal transition (MET). Additionally, many miR- 
NAs have been implicated as diagnostic and prognostic 
biomarkers and/or as potential therapeutic targets in can- 
cer [e.g. (6—-9)]. 

miRNAs are generated through a multistage process of 
miRNA biogenesis tightly controlled by various proteins 
consecutively nursing primary miRNA transcripts (pri- 
miRNAs) from their transcription to their cellular function 
within the miRNA-induced silencing complex (miRISC) 
(10-13). The major steps of canonical miRNA biogenesis 
include nuclear pri-miR NA processing by the microproces- 
sor complex, whose core is formed by RNase DROSHA 
acting together with DGCR8 dimer and several other regu- 
latory proteins, including P68 (DDX5) and P72 (DDX17), 
to release from the pri-miRNA hairpin-shaped secondary 
precursor (pre-miRNA). Next, pre-miRNA is exported to 
the cytoplasm by the Exportin-5(XPO5):Ran-GTP(RAN) 
complex, where it is intercepted by the multiprotein miRISC 
loading complex (RLC) containing the RNase DICER1, 
which cuts off the pre-miRNA apical loop to release an 
~22-bp-long miRNA duplex in assistance of partner pro- 
teins such as PRK RA (PACT) or TARBP1 (TRBP). Within 
the miRISC, the miRNA duplex is unwound (supported 
by, 1.a., GEMIN4 and MOV10) to select a miRNA guide 
strand (mature miRNA) that recognizes mRNA targets by 
complementary interaction and silences them with the as- 
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sistance of AGO and TNRC6A (GW182) proteins by trans- 
lation repression and/or RNA deadenylation and degrada- 
tion. Each step of this process may be further regulated by 
additional mechanisms/proteins that either increase or de- 
crease the miRNA biogenesis rate (11,14,15). For example, 
LIN28A/B binds to the apical loop of specific pre-miR NAs, 
including pre-let-7, and upon its uridylation by ZCCHC11 
or ZCCHC6 TUTases leads to pre-miRNA degradation 
by DIS3L2 exonuclease (16,17). It should also be noted 
that there are alternative pathways of miRNA biogenesis, 
such as the generation of miRNAs (mirtrons) from specific 
short introns (DROSHA-independent) (18) or DICER1- 
independent processing of miR-45la by AGO2 (19,20). 

There are currently >2600 human miRNAs deposited in 
miR Base (21,22). It is speculated that miR NAs regulate the 
expression of most protein-coding genes (23,24). miRNA 
levels and consequently levels of controlled genes may be af- 
fected by various factors and processes. First, the expression 
of miRNA genes, such as the expression of protein-coding 
genes, is regulated by various transcription factors, such as 
MYC, TP53 or SMAD4 (25-28). It was shown that many 
miRNA genes are located in copy number-variable (CNV) 
regions and are frequently amplified or deleted in cancer 
(29-31). miRNA genes may also be affected by aberrant 
DNA methylation and histone acetylation, leading to the 
silencing of the miRNA genes (32-38). Additionally, methy- 
lation of miRNA precursors may facilitate their processing 
(39) as well as impair the ability of miRNAs to downregu- 
late their targets (40). It was also shown that the occurrence 
of single nucleotide polymorphisms (SNPs) (41-44) and 
germline or somatic mutations (45-49) may affect miRNA 
processing (level) and the ability of miRNAs to recognize 
their targets. Mature miRNAs may be captured and inacti- 
vated by cellular miRNA sponges such as IncRNAs, circR- 
NAs, or pseudogenes (50,51). Finally, the deficiency or im- 
pairment of the components of miRNA gene transcription 
activators, miRNA processing machinery and the miRISC 
complex (for simplicity, cumulatively called miRNA bio- 
genesis genes/proteins) described in the previous paragraph 
may affect miRNA levels and the effectiveness of miRNA 
gene silencing, respectively (52,53). 

A body of evidence has indicated that deleterious 
germline mutations in the DICERI gene are responsible 
for DICER1 syndrome, an inherited disorder characterized 
by an increased frequency of various types of malignant 
and benign tumors that occur predominantly in infants and 
young children, the most common and most characteris- 
tic of which is pleuropulmonary blastoma (54,55). It was 
shown that in cancers associated with DICERI1 syndrome 
as well as other early childhood cancers (e.g. Wilms’ tu- 
mor), a specific pattern of somatic DICER/ second-hit mis- 
sense mutations occurs. All these mutations are located in 
or adjacent to metal-ion-binding residues (hotspots; pre- 
dominantly D1709 and E1813) of the RNase IIIb domain 
(RIIIb) (54). Later, in similar types of childhood cancers, 
a similar pattern of somatic mutations was also identified 
in the corresponding residues (E1147, D1151) of the RIIIb 
in DROSHA (56-59). Functional analyses revealed that the 
mutations in DICER lead to less effective generation of 
5p-miRNAs (60-62), whereas mutations in DROSHA af- 
fect the generation of miRNAs from both pre-miR NA arms 


(56), as reviewed in (54,63,64). Very recently, it was also 
shown that the recurring mutation in the RNase IIIa do- 
main (RIIa) of DICER/ occurring predominantly in uter- 
ine carcinoma may cause the same effect as the mutations 
in RIIIb (65). Interestingly, the occurrence of DROSHA 
mutations in Wilms’ tumor coincides with the occurrence 
of mutations in S/X/ and SIYX2, transcription factor genes 
that are also frequently mutated in the tumor (66). An- 
other hotspot mutation commonly occurring in Wilms’ tu- 
mor is E518K in the double-stranded RNA-binding do- 
main (dsRBD) of DGCR8 (57,58,66). It was also shown 
that in cancers with a high rate of microsatellite instability 
(MSI), such as colon, gastric, and endometrial tumors, spe- 
cific indel hotspots occur in TRBP and C-terminal positions 
of XPOS (67,68); however, these mutations were not further 
analyzed in other studies. Knowledge of the germline and 
somatic variation in miRNA biogenesis genes is summa- 
rized in (64). Additionally, SMAD4, encoding the SMAD4 
transcription factor activating many genes in response to 
transforming growth factor beta (TGFB)/bone morpho- 
genetic protein (BMP) signaling (69,70), is a well-known 
tumor suppressor gene that is highly mutated in many can- 
cers, including pancreatic and colorectal cancers (71). Al- 
though SMAD4 was also implicated in the transcription 
of miRNA genes (25,72-74), the effect of SMAD4 muta- 
tions has never been tested in the context of the activation of 
miRNA genes. Furthermore, it was shown that some SNPs 
in miRNA biogenesis genes are associated with the risk of 
various cancers. Examples include (i) the rs3742330 (A>C) 
SNP located in the 3’ UTR of DICER] that affects DICER1 
mRNA stability and is associated with susceptibility and 
malignancy in gastric cancer (75,76), increased survival of 
T-cell lymphoma patients (77) and lower prostate cancer 
aggressiveness (78); (11) the rs78393591 SNP in DROSHA 
and rs114101502 SNP in ZCCHCI1 (TUTase responsible 
for pre-miRNA uridylation and subsequent DICER 1 cleav- 
age inhibition) associated with the risk of breast cancer 
(79) and (iti) the rs11786030 and rs2292779 SNPs in AGO2, 
rs9606250 SNP in DGCR8, and rs1057035 SNP in DICERI 
associated with the survival of breast cancer patients (80). 
Additionally, the SNPs rs2740348 C>G and rs7813 C>T 
in GEMIN4, a gene involved in miRISC formation and 
miRNA-duplex unwinding, were implicated in the risk of 
several cancers, although the results were not conclusive 
(81-85). 

In this study, we took advantage of the data generated 
within The Cancer Genome Atlas (TCGA) project to an- 
alyze the somatic mutations in miRNA biogenesis genes. 
As a result, in a wide panel of 33 cancer types consisting 
of over 10 000 samples, we identified hundreds of muta- 
tions and many recurrently mutated hotspot positions and 
showed that some of the genes are specifically overmutated 
in particular cancer types. We also confirmed the common 
occurrence of deleterious mutations in SMA D4 and further 
characterized the specific hotspot mutations in SMAD4, 
SMAD2 and DICER], the last group of which were previ- 
ously reported mostly in childhood cancers. We followed up 
on the consequences of some of the mutations and showed 
characteristic changes in miRNA profiles resulting from 
specific mutation types in DICERI, SMAD4 and SMAD2. 
We also showed the associations of the mutations with can- 


2207 Auenuqa4 0} uo ysanB Aq gegg909/109/z/6y/el0!We/4eU/WO0o‘dnosIWapese//:sdyy Woy pepeojumog 


cer characteristics and patient survival. Additionally, the 
specific hotspot mutations in DROSHA and DGCR8 com- 
monly observed in Wilms’ tumor and other childhood can- 
cers were absent in adult cancers. 


MATERIALS AND METHODS 
Data resources 


We used molecular and clinical data (Level 2) for 33 can- 
cer types generated and deposited in the TCGA repository 
(http://cancergenome.nih.gov). These data included the re- 
sults of somatic mutation calls in whole-exome sequencing 
(WES) datasets of 10 369 samples (later limited to 10 255) 
analyzed against matched normal (noncancer) samples with 
the use of the standard TCGA pipeline. Hypermutated sam- 
ples were defined as samples with >10 000 mutations in 
the whole exome. As in general, the TCGA datasets include 
only one sample from each cancer specimen, in the analy- 
sis, we did not consider cancer stromal heterogeneity. Copy 
number data were obtained via Xena UCSC as a ‘gene-level 
copy number (gistic2 thresholded)’ dataset of the TCGA 
Pan-Cancer (PANCAN) cohort. The crystal structure of 
the phosphorylated SMAD2/SMAD64 heterotrimeric com- 
plex (PDB code: 1U7V) (86) was visualized with the use of 
PyMOL (Schrodinger, LLC, New York, NY, USA). 


Data processing 


We analyzed somatic mutations in 29 miRNA biogenesis 
genes (coding exons were extended by 2 nt on each side 
to enable identification of definitive splicing mutations). 
The genomic coordinates of the analyzed genes/regions are 
shown in Supplementary Table S1. From the WES data gen- 
erated with the use of four different algorithms (MuSE, 
MutTect2, VarScan2 and SomaticSniper), we extracted so- 
matic mutation calls with PASS annotation. The extraction 
was performed as described in (45) with a set of in-house 
Python scripts available at (https://github.com/martynaut/ 
mirnaome_somatic_mutations). Briefly, the lists of somatic 
mutations detected by different algorithms were merged 
such that variants detected by more than one algorithm 
were not multiplicated. To further increase the reliability 
of the identified somatic mutations (and avoid the identi- 
fication of uncertain mutations), we additionally removed 
those that did not fulfill the following criteria: (i) at least 
two alternative allele-supporting reads in a tumor sample 
(if no alternative allele-supporting read was detected in the 
corresponding normal sample); (i1) at least a 5x higher fre- 
quency of alternative allele-supporting reads in the tumor 
sample than in the corresponding normal sample; (iii) a so- 
matic score parameter (SSC) > 30 (for VarScan2 and So- 
maticSniper) and (iv) a base quality (BQ) parameter for al- 
ternative allele-supporting reads in the tumor sample >20 
(for MuSE and MuTect2). All mutations were designated 
according to HGVS nomenclature at the transcript and pro- 
tein levels, and the effects of mutations were predicted us- 
ing the Ensembl Variant Effect Predictor (VEP) tool (87). 
For visualization of mutations on the gene maps, we used 
ProteinPaint from St. Jude Children’s Research Hospital — 
PeCan Data Portal (88). The protein domains visualized on 
gene maps were positioned according to UniProt (89) 
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miRNA expression analysis 


We obtained miRNA expression data via Xena UCSC 
as batch-effects normalized data for TCGA Pan-Cancer 
data (note that due to normalization, some miRNA levels 
were below 0). Expression data from the set of ~700 
miRNAs (annotated in miRBase) were filtered to exclude 
miRNAs with undetectable signals (level = 0) in more 
than 30% of pan-cancer samples (or 10% when the analysis 
was performed for specific cancers). As high-confidence 
miRNAs, we considered those deposited in MirGeneDB 
v2.0 which were defined based on criteria that include 
careful annotation of the mature versus passenger miRNA 
strands and evaluation of evolutionary hierarchy (90,91). 
Next, to enable pan-cancer comparisons, we normalized 
the variation (range of —1 to 1) and median (median 
= 0) of miRNA levels to be equal in each cancer type. 
The normalized miRNA level changes (in pan-cancer) 
were calculated/expressed as differences, and the changes 
in raw (non-pan-cancer-normalized) miRNA levels (in 
individual cancers) were calculated/expressed as log» fold 
changes. For the differential analysis of miRNA levels, 
normalized against the level of miR-45la (see Results) in 
UCEC, we additionally used miRNA level data expressed 
only as reads_per_million_ miRNA mapped (RPM) values 
(non-batch effects normalized), downloaded from Fire- 
Browse _ (illuminahiseq_mirnaseq-miR_gene_expression). 
The analysis of isomiRs was performed with the use of 
the miRNA isoform dataset (illuminahiseq_mirnaseq- 
miR_isoform_expression) from FireBrowse. 


Statistics 


Unless stated otherwise, all statistical analyses were per- 
formed with statistical functions in the Python module 
scipy.stats. Particular statistical tests are indicated in the 
text, and if not stated otherwise, P < 0.05 was considered 
significant. If necessary, P-values were corrected for multi- 
ple tests. Mutation density was calculated as the number of 
detected mutations divided by the length of analyzed genes 
(total length of all coding exons). For patient survival anal- 
yses, we used a log-rank test (from the lifelines v0.24.8 li- 
brary (Davidson-Pilon C et al., https://zenodo.org/record/ 
3833188#.X8fbMsIlKhPY)). To determine the direction of 
mutation effects on survival, we used Cox’s proportional 
hazard model. Survival plots were created using Kaplan- 
MeierFitter from the lifelines library. A comparison of the 
occurrence of mutations in miRNA biogenesis genes among 
cancer stages (with correction for cancer type) was per- 
formed with the Cochran—Mantel—Haenszel test for pan- 
cancer and Fisher’s exact test for specific cancers. 


RESULTS 


Distribution of somatic mutations in miRNA biogenesis genes 
across different cancer types 


For analysis, we selected 29 miRNA biogenesis genes en- 
coding proteins playing roles in (i) the transcription of pri- 
mary miRNA precursors (pri-miR NAs), (ii) pri-miRNA to 
pre-miRNA processing in the nucleus, (iii) the export of pre- 
miRNA from the nucleus to the cytoplasm, (iv) pre-miRNA 
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Table 1. List and characteristics of the selected miRNA biogenesis genes 





Gene/protein ID (alias) 


Nuclear steps of miRNA biogenesis 


miRNA-related function 





SMAD4 


Activates miRNA precursor transcription upon TGFB/BMP activation (74,132,136) 


























FUS Facilitates cotranscriptional DROSHA recruitment to pri-miRNAs; shown to bind a terminal 
loop of specific neuronal pri-miR NAs and to enhance their processing (141) 

SRSF3 (SRP20) Enhances mammalian pri-miRNA processing upon binding to the CNNC motif in the 3p 
flanking sequence (142) 

DROSHA RNase IT; catalyzes pri-miRNA to pre-miRNA for processing/cleavage (11,143,144) 

DGCR8 Cofactor of DROSHA; coordinates the recognition of pri-miRNA at the dsRNA-ssRNA 
junction; functions as a molecular anchor and direct DROSHA to cleave pri-miRNA ~11 bp 
before the junction (11,145) 

DDXS (P68) Plays a role in recognition/binding of pri-miRNAs by the DROSHA complex; recruits 
DROSHA/DGCRS to pri-miRNA (146,147) 

DDX17 (P72) Plays a role in recognition/binding of pri-miRNAs by the DROSHA complex (146,148) 

GSK3B Facilitates pri-miRNA binding by DROSHA and enhances DROSHA association with 
cofactors DGCR8 and P72 (149); DROSHA phosphorylation/stabilization (150) 

SMAD2 Accelerates pri-miRNA processing by DROSHA; in complex with SMAD4 activates miRNA 


precursor transcription upon TGFB/BMP signaling (72,136,137) 





Export to cytoplasm 








XPOS (EXP5) Plays a role in the nuclear export of pre-miRNA to the cytoplasm (151,152); facilitates the 
nuclear cleavage of clustered pri-miRNAs (153) 
RAN Interacts with XPOS; plays a role in the export of pre-miRNA to the cytoplasm (151) 





Cytoplasmic steps of miRNA biogenesis 





DICER1 (DICER) 


RNase II; catalyzes pre-miRNA to miRNA-duplex processing by cutting off the pre-miRNA 
terminal loop (11,154-156) 








TARBP2 (TRBP) Coordinates pre-miRNA recognition by DICER and the precision of DICER! cleavage 
(157,158) 

PRKRA (PACT) Coordinates pre-miRNA cleavage by DICER] and assures the precision of the cleavage; plays a 
role in miRISC assembly and thus participates in miRNA stabilization and accumulation in the 
cell (158,159) 

ADAR Double-stranded RNA-specific adenosine deaminase; plays a role in pri- and pre-miRNA stem 


editing (A to I), which makes miRNA precursors resistant to DICER] cleavage (160) 





KHSRP (FUBP2, KSRP) 


LIN28A and LIN28B 


Binds to the terminal loop sequence of a subset of miRNA precursors, promoting their 
maturation (161) 


Bind to the terminal loops of specific pre-miRNAs (including pre-let-7 and pre-miR-9) and, 
upon recruitment of ZCCHC11 or ZCCHC6 that induces pre-miRNA uridylation, inhibit 
DICER processing (17,162,163) 





ZCCHC11 (TUT4) and 


Play a role in pre-miRNA uridylation and thus inhibit DICER1 processing (17,164) 























ZCCHC6 (TUT7) 

DIS3L2 Exoribonuclease; targets the uridylated let-7 precursors (16) 

miRNA functioning 

AGO1, AGO2, AGO3, and Play a role in miRISC formation/loading (summarized in (165,166)); catalytically active AGO2 

AGO4 functions as an endonuclease upon complementary mRNA:miRNA interaction (167) 

GEMIN4 Binds to the miRNA guide strand and facilitates the formation of a miRISC by unwinding the 
miRNA duplex (83,168) 

MOV10 Upon interaction with the miR NA-loaded AGO-protein complexes, plays a role in mRNA 
degradation; present in P-bodies (169,170) 

FMRI Interacts with DICER1 and AGO1 during mRNA degradation (171); controls DROSHA 
expression (172) 

TNRC6A (GW182) Component of P-bodies; plays a role in mRNA degradation upon interaction with Argonaute 


proteins (173,174) 
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processing and miRNA maturation in the cytoplasm and 
(v) miRNA:target recognition/interaction and regulation 
of downstream silencing effects (Table | and Figure 1). 

To identify somatic mutations in the miRNA biogene- 
sis genes, we took advantage of WES datasets of 10 369 
paired tumor/normal samples generated within the TCGA 
project. The collected samples cover 33 different cancer 
types (analyzed together as a pan-cancer cohort) (Supple- 
mentary Table S2). The list of all cancer types (full names 
and abbreviations) is shown in Figure 2 (to avoid confu- 
sion, we will use the abbreviations for the TCGA sample 
sets but not generally for particular types of cancer; in the 
latter case, we will use full cancer-type names). Applying the 
rigorous criteria described in the Materials and Methods, 
we identified a total of 5483 mutations in the pan-cancer 
cohort. However, a substantial fraction (n = 1834, ~30%) 
of the mutations were identified in a relatively small num- 
ber (x = 114, ~1%) of hypermutated samples. As shown in 
Supplementary Figure $1, the number of mutations in hy- 
permutated samples strongly depends on the general bur- 
den of mutations in these samples, implying enrichment 
of random, most likely passenger mutations in the hyper- 
mutated samples. Therefore, to reduce the proportion of 
confounding mutations, we removed hypermutated sam- 
ples from subsequent analyses. The removed, hypermutated 
samples originated mostly from SKCM, UCEC and COAD 
(Supplementary Figure S1). 

After removing the hypermutated samples, we continued 
analysis on 10 255 samples with 3649 mutations, including 
2196 (60%) missense mutations, 774 (21%) synonymous mu- 
tations and 625 (17%) definitive deleterious mutations, con- 
sisting of 341 frameshift, 222 nonsense and 62 splice-site 
mutations (Supplementary Table $3). Other types of mu- 
tations, such as start or stop codon mutations or complex 
mutations, were also present in small fractions. At least one 
mutation was detected in 2104 (21%) samples, including nu- 
merous samples with more than one mutation (Figure 2A 
and B). The frequency of mutated samples differed substan- 
tially among cancer types, ranging from 45% (SKCM) to 0% 
(KICH) (Figure 2A, Supplementary Table S3), and roughly 
corresponded to the mutational burden in particular cancer 
types. 


Frequency of somatic mutations in miRNA biogenesis genes 
across 33 cancers 


As shown in Figure 2C, TNRC6A, DICERI and SMAD4 
are among the most highly mutated genes (~3% of sam- 
ples in pan-cancer), and the least mutated genes (~0.3%) are 
LIN28A, RAN and SRSF3. Although there is some corre- 
lation between the frequency of mutations in the particular 
genes and the length of their protein-coding sequences (R? 
= 0.62), the overall mutation frequency cannot be simply 
explained by the length of the genes. Additionally, none of 
the genes are well-known highly mutated genes or are lo- 
cated in late-replicating regions known to be overmutated 
in cancer (92). 

Although there is a general correlation of mutation fre- 
quency in particular genes between cancers, there are also 
apparent striking exceptions of genes specifically overmu- 
tated in particular cancers (Figure 2C). This contrasts with 
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observations that over- and undermutated regions generally 
overlap between different cancer types (92) and suggests the 
nonrandom occurrence and potentially functional nature of 
the overmutations. To identify overmutated genes, we sta- 
tistically compared the frequency of mutations (overmuta- 
tion) in particular genes (versus all other genes) in partic- 
ular cancers with corresponding frequencies in pan-cancer 
(Figure 2C, Supplementary Table S4). The most striking 
example of an overmutated gene is SMVAD4 overmutated 
in PAAD (22% of samples), READ (16%), COAD (14%), 
STAD (9%) and ESCA (8%). Other interesting examples in- 
cluded AGO4, LIN28A and SMAD2 overmutated in BLCA 
(an otherwise extremely low-mutation cancer with no mu- 
tations in other genes); LIN28B overmutated in SKCM; 
PRKRA overmutated in OV; and DDX5 overmutated in 
BRCA and KIRP. There are also other genes with increased 
mutation frequency, e.g. TNRC6A in STAD and DICERI 
in SKCM and UCEC (nominal P > 0.005), but these over- 
mutations are only nominally significant. Consistent with 
the above findings, the overmutated genes are outliers in 
terms of the correlation of mutation frequencies in particu- 
lar genes between particular cancer types and the remaining 
pan-cancer (Supplementary Figure S2). 


Distribution of mutations in the miRNA _ biogenesis 
genes—identification of hotspot mutations 


To illustrate the mutation distribution along the protein se- 
quences, all the mutations were visualized in lollipop plots 
(Figure 3 and Supplementary Figure $3). As shown in 
the plots, although most of the mutations are quite evenly 
distributed along the genes, there are also some hotspot 
regions/mutations suggesting the functional nature of these 
changes. The most striking example is a cluster of eight 
hotspots of recurrently mutated amino acid (AA) residues 
(i.e. D351, G352, D355, P356, R361, H382, G386 and 
D537) occurring in the MH2 domain of SMAD4 (Fig- 
ure 3A). The most prominent hotspot position is R361, 
which by itself acquired 37 missense mutations, account- 
ing for 23% of all SMAD4 missense mutations. There are 
also two recurrently mutated AA residues (i.e. P305 and 
R321) in the MH2 domain of SMAD2 (Figure 3B). The 
hotspot mutations in the MH2 domains (both in SMAD4 
and SMAD2) likely affect SMAD4:SMAD2 heterotrimer 
formation. Additionally, there are two recurrent protein- 
truncating nonsense mutations in the MH2 domain of 
SMAD2, i.e. p.S306Ter and p.S464Ter, the latter of which 
occurs 13 times is localized in the last exon of SMAD2 and 
likely truncates the last five AAs of the protein just before 
two phosphorylation sites (S465 and S467) critical for acti- 
vation of SMAD2 upon TGFB/BMP signaling (93). 

Two other clusters of missense hotspot mutations are lo- 
cated in metal ion-binding residues of the RIIJa (S1344) 
and RIIIb (E1705, D1709, D1810 and E1813) domains of 
DICER! (Figure 3C). These hotspots were previously de- 
tected and functionally characterized in various pediatric 
cancers (54,64), thyroid adenomas (94), and the TCGA co- 
hort, mostly in UCEC samples (65). 

It is also worth noting the hotspot missense mutation, 
Le. p.P127L in the DRBM 2 domain of PRKRA (Figure 
3D), occurring 13 times in OV, COAD, GBM and LUAD 
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Figure 1. Schematic depiction of miRNA biogenesis and functions of the miRNA biogenesis proteins/genes involved in the subsequent steps indicated 


and described in Table 1. 


(6, 3, 3 and | mutations, respectively). As the DRBM 2 do- 
main plays a role in interaction with other proteins (e.g. pro- 
tein kinase R, PKR), the mutation may affect the interac- 
tions. Consistently with the frequent occurrence of the mu- 
tation in OV, the changes in the PRKRA level were recently 
linked with the resistance of mucinous ovarian cancer to the 
miR-515-3p dependent platinum-based (oxaliplatin) treat- 
ment (95). Although the mutation overlaps with the SNP 
(ts75862065), the fact that it occurs predominantly in OV, 
which is otherwise a moderately mutated cancer, combined 
with the overall overmutation in PRKRA, argues against 
the accidental occurrences of the mutation as artifacts of the 
mutation calling process. There are also two hotspot syn- 
onymous mutations in FUS (p.G222= and p.G227=) (Sup- 
plementary Figure $3). A role of such hotspots cannot be 
excluded (96); however, we did not investigate them further 
in this study. Similarly, recurring missense mutations in the 
other genes may be important for specific cancers; however, 
they occur much less frequently. 

There are also some hotspot indel mutations, eg., 
p.N58lfsTer8 in ZCCHCII, p.Y948MfsTerl6 in 
ZCCH C6, p.K121SfsTer8 and p.G163EfsTer20 in DDX17, 
p.W804GfsTer99 and p.R1183GfsTer7 in TNRC6A, 
p.L17CfsTer99 in AGOJ, and pA603RfsTer71 in AGO2 
(Figure 3 and Supplementary Figure S3). However, it must 
be noted that the exact position of indel hotspot mutations 
may not be necessarily driven by cancer advantage but by 
sequence properties (e.g., the presence of short tandem 
repeat motifs). In fact, many indel mutations and some of 
the indel hotspots occur at sequence motifs often mutated 
as a result of MSI. Additionally, as most indels in coding 
sequences result in frameshifts and premature termination 
of translation, triggering nonsense-mediated mRNA decay 
(NMD) and leading to the complete loss of mRNA, 
the exact position of indel hotspots may not be that 
important. 


In the next step, we looked for hotspot mutations previ- 
ously observed in the miRNA biogenesis genes, i.e. (1) E969 
and E993 in RIIIa and E1147, D1151, Q1187 and E1222 in 
RIIb in DROSHA, (ii) p.E518K in the dsRDB1 domain in 
DGCRS and (i11) p.R440Ter in XPOS observed in different 
pediatric cancers, especially in Wilms’ tumor (56,57,59). Of 
these mutations, we found p.E518K in DGCR8 occurring in 
two of 495 (0.4%) cases of THCA, p.R440Ter in XPOS in 
one case of UCEC and one case of SKCM, and p.DII51E 
in DROSHA in one case of COAD (Supplementary Figure 
S3). Of the indel hotspots in YPOS and TARBP2 detected 
previously in colon, gastric, and endometrial tumors with 
MSI (67,68), we detected only two cases of the C insertion 
in the poly-C track (p.M145HfsTer13) in TARBP2: one in 
UCEC and one in STAD (both cancers often characterized 
by MSI). 


Functional consequences of SA D4 and SMA D2 mutations 


From the visual investigation (Figure 2C and Figure 3A), 
it is apparent that SMAD4 is the gene with the highest 
density of mutations (273 mutations, 160 mut/kbp) and 
the largest proportion of deleterious mutations (33%). As 
mentioned in the previous paragraph, there are also eight 
hotspots of missense mutations, seven of which are lo- 
cated in a relatively small region (D351 to G386) of the 
MH2 domain, playing a role in heterotrimer formation with 
other receptor-dependent SMAD proteins (R-SMADs, e.g. 
SMAD2) to mediate the TGFB/BMP transcriptional re- 
sponse (97,98). The collection of a relatively large number 
of mutations associated with different cancer types reveals 
that the proportion of hotspot and deleterious mutations [in 
pan-cancer, 71 (44%) versus 91 (56%), respectively] differs 
substantially between cancers (Figure 4A) and is the high- 
est in READ (86% versus 14%; P = 0.001) and the lowest 
in PAAD (27% versus 73%; P = 0.032). This may suggest a 
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Figure 2. Mutation frequency in miRNA biogenesis genes across the analyzed cancer types. (A) The total number of samples (black bars) and percentage 
of samples with mutations (blue bars) in the panel of miRNA biogenesis genes. (B) The proportion of samples with different numbers of mutations. 
Each sample is shown as a dot (only samples with at least one mutation are shown). Due to the large number of samples with a particular number of 
mutations, dots overlap with each other. (C) Heatmap showing the frequency [%] of mutations within each of the miRNA biogenesis genes (y-axis) in 
different cancer types (x-axis). The genes are ordered by the frequency of mutations in pan-cancer (first column). The genes significantly overmutated in 
particular cancer types are marked with a color frame indicating the nominal P-value (Fisher’s exact test; the P-value scale is indicated under the heatmap). 
A P-value <0.00005 (red frame) corresponds to significant enrichment after correction for multiple tests (adjusted P <0.05). Specific P-values are shown 
in Supplementary Table S4. 
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Figure 3. Distribution of the identified mutations in the miRNA biogenesis genes. (A-F) depict SMAD4, SMAD2, DICERI, PRKRA, DOX17 and ZC- 


CHC11, respectively. The remaining miR NA biogenesis genes 
along the gene coding sequences, with the exon structure and 


are shown in Supplementary Figure $3. Mutations are visualized in the form of lollipop plots 
protein functional domains indicated. The size of a mutation symbol (circle) is proportional 


to the number of mutations, and the color indicates the type of mutation (as shown in the legend). All mutations were annotated according to HGVS 
nomenclature, and the effect of the mutations at the protein level was denoted with the VEP tool (Ensembl). 


different role of SMAD4 in different cancers and different are located on the surfaces of the SMAD4:SMAD2 inter- 
effects of these two types of mutations. Most of the hotspots action, which is consistent with the notion that hotspot 
(five of eight hotspots, 83% of all hotspot mutations) coin- mutations prevent SMAD complex formation and thus 
cide with charged (basic or acidic) AAs, affecting the elec- avert transcription of SMAD-controlled genes in response 
trostatic properties of MH2 important for interaction with to TGFB/BMP signaling. As in SMAD4, the SMAD2 
R-SMADs (86). To visualize the location of the hotspot AA hotspot residues localize at the SMAD4:SMAD2 interac- 
residues, we marked them on a crystal structure of the het- tion surfaces (Figure 4B and Supplementary S4). 

erotrimeric SMAD4:(SMAD2), complex (86). As shown The copy number analysis of SMAD4 showed that 
in Figure 4B and Supplementary S4, all hotspot residues SMAD4 deletions are significantly more frequent in sam- 


Zz0z Aseniqa4 Q|, uo ysanB Aq eggg909/L09/z/6y/alo1e/4eU/Woo'dno‘siwapese//:sdyy WO pepeo|uMog 


Nucleic Acids Research, 2021, Vol. 49, No.2 609 





p=0.001 0.116 0.126 0.756 0.349 0.032 


A 
100 
60 
7 8 
90 
@ 
0 






% of mutations 


240 25 
SMAD2/SMAD4 surface 








9 15 mutations: 
mhotspot 
= deleterious 


SMAD2 


PAN-C READ STAD COAD ESCA LUAD PAAD other 
















































































































































































































































































Cc p=3x10" 2x10" 1x10" 
100 
a 9 a 
80 
6519 copy number: 
3g 60 @ normal or gain 
5 BH copy lost 
2 40 
6 
x 
20 
SMAD2/SMAD2 surface 
no SMAD4 all hotspots R361 deleterious SMAD4/SMAD2 surface 
D mutations 
all hotspot mutations _ R361 mutations deleterious mutations 
oe Wl 
iR-196b, | let-7c Re 
re mIR 23a a 4 . _ 44 whee 
= = = 
s J s | S 
= 2 > 
‘ 3 miR-1247 gs, | . 25] . 
’ ac e : let-7e 
7 ne miR-210 8 
. 
1 iR-125b 
miR-1247206 , | “m Jt 
miR-24 ° miR-27a 24 . 24 ie 2] miR-246, $6 
ee miR-23a a, = o.* ¥ 
imaonne’S as | A 
L See ite goawanesadloseeiqusestySiaedealy essa miR-494 Sot... LL Sresneisaces 
We 1 < 
5p miRNA | 
e 3p miRNA 
not specified, | 
; 1 
-0,1 
difference difference difference 
E pan-cancer F PAAD COAD READ 
1 | a b ' 13 
‘ : : 2 2 ; 
8 0s _ 4 12 f : . 
co 3 
SY g 40 soon Q 
Es , a 1 o cs 10 : 
=e 0} 9 ¢ ——— = 
82 2 10 : 8 ee : 
§3 3 % = . s 
q 05 I 8 ; = ; 
y 6 
tl iy | : . 
-14 = : 
1 2 e 
oF 7 14 14 
eo) al al 
fg 08 Se gio ie 
33 = 3 te 42 on 12 
58 3? ate _—— : 
z a 0 — 86 = i 10 es 
ee = 4 iy teaenessa 
oe e 8 s 
gE 05 Eo 8 “ _ 
g 
| ok 0 6 
4 alt ‘ 6 
a2 ¢ 6 3 8 8 
$3 Bog € 38 € Hi PAAD COAD 
23 2 x) °8 3 — all hotspots — all hotspots 
ee = 3 i= 
oO 
B a : 
G pan-cancer pan-cancer ° 
1 — all hotspots — deleterious mut 
c — no SMAD4 mut — no SMAD4 mut 0 
oa 6 1000 2000 0 1000 2000 3000 4000 
3 PAAD READ 
g — deleterious — all hotspots 
tg mut 
o 
g 8 
3 
0 0 
0 2500 +5000 + 7500 10000 0 2500 5000 7500 10000 7) 7500 2000 01000 2000 3000 
timeline [days] timeline [days] timeline [days] timeline [days] 
Figure 4. Characteristics of SMAD4 mutations. (A) The proportions of hotspot and deleterious mutations (y-axis) in different cancer types (x-axis). (B) 
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ples with SMA D4 mutations than in samples without such 
mutations, indicating frequent loss of the second allele 
[loss of heterozygosity (LOH) effect] in samples with both 
hotspot and deleterious mutations (Figure 4C). SMAD4 
mutations were previously shown to affect the expres- 
sion of many genes. Although it was proposed that the 
TGFB/SMAD pathway also controls the expression of 
miRNAs (25,27,28,73,99,100), the effect of the mutations 
on miRNA levels has never been tested. To do so at the 
pan-cancer scale, we took into account only 522 miRNAs 
whose level was >0 in at least 70% of tested TCGA samples. 
We normalized miRNA levels to make the median (equal to 
0) and variance of these levels comparable between cancers. 
As shown in Figure 4D, there is an excess of downregulated 
miRNAs in samples with hotspot mutations (e.g. at the level 
of P-value < 0.05, 78% and 22% of miRNAs are downreg- 
ulated and upregulated, respectively), which is consistent 
with the expected impairment of SMAD complex forma- 
tion and transcription factor activity. A similar effect was 
observed when the analysis was performed only for R361, 
the most frequently mutated hotspot residue (Figure 4D), 
and when the analysis was limited to the high-confidence 
miRNAs (annotated in MirGeneDB; Supplementary Fig- 
ure SSA), which further strengthens the reliability of the 
observation. Among the downregulated miR NAs (Supple- 
mentary Table S5), there are many miRNAs with well- 
documented cancer-related functions, for example, the re- 
cently discovered suppressormiRs miR-23a-3p (reviewed 
in (101)), miR-196b-S5p (102,103), and miR-1247-5p (104). 
Noteworthy, such an effect of excessive miRNA downreg- 
ulation is not visible for deleterious mutations (Figure 4D 
and Supplementary Figure SSA). Among the most signif- 
icantly downregulated or upregulated miRNAs in samples 
with deleterious mutations are let-7c-S5p and miR-125b-5p 
or miR-215-5p, miR-21-5p and miR-210-5p, respectively, 
all well-known cancer-related miR NAs (e.g. (105)). The fre- 
quent observation of pairs of 5p and 3p-miRNAs (e.g. let- 
7c-Sp and let-7c-3p) and groups of miR NAs generated from 
miRNA clusters (e.g. miR-99a/let-7c or miR-1/133a clus- 
ter) among the consistently altered miRNAs in samples 
with both hotspot and deleterious mutations indicates that 
at least some of the miRNA alterations result from aberra- 
tions at the transcriptional level. As shown in Figure 4E and 
F, the levels of let-7c-5p and miR-196-5p, which are exam- 
ples of miRNAs most significantly altered in pan-cancer, of- 
ten show similar alterations in the most commonly mutated 
cancers, 1.e. PAAD, COAD and READ, although it must 
be noted that due to the very limited number of mutations, 
the results in individual cancers are not perfectly consis- 


< 


tent and have to be interpreted with caution. Nonetheless, 
among the miRNAs downregulated by either hotspot or 
deleterious SMAD4 mutations are many miRNAs demon- 
strated before to be activated by TGFB/SMAD signaling. 
It includes miR-23a-3p, miR-27a and miR-24 constituting 
a miRNA cluster that is the first and most well-studied 
group of miRNAs regulated by SMAD4 (74), miR-181b- 
5p (106), 199a-3p (73) and miR-494-3p (107). To iden- 
tify pathways/processes enriched in the genes regulated by 
the downregulated miRNAs, we performed KEGG path- 
way enrichment analysis with miRPath v3.0. The analysis 
showed that miRNAs downregulated both by the hotspot 
and deleterious SMAD4 mutations are associated (adjusted 
P < 0.01) with similar cancer-related processes, including 
‘Pathways in cancer’, ‘ErbB signaling pathway’, ‘Glioma’, 
and ‘Proteoglycans in cancer’ (Supplementary Table S6). It 
is worth noting that among the enriched pathways is also 
the ‘TGF-beta signaling pathway’ regulated by 32 of the 49 
(65%, P = 0.0002) and 20 of the 29 (69%; P = 0.00005) miR- 
NAs downregulated by the hotspot and deleterious muta- 
tions, respectively. 

A comparison of SMAD4 mutations with clinical can- 
cer data did not show an association of mutations with tu- 
mor staging but showed an association with decreased over- 
all survival (OS) at the pan-cancer level (log-rank test: P = 
0.049 and P = 0.0006 for hotspot and deleterious mutations, 
respectively) (Figure 4G). As the analyses performed at the 
pan-cancer level may be affected by the unequal distribution 
of mutations among cancer types, we repeated the survival 
analysis with the most frequently mutated individual can- 
cers. Although the analyses of individual cancer types were 
of very limited statistical power because of a low number of 
mutations, some cancer types also showed trends toward de- 
creased survival of patients with the mutations, i.e. COAD 
and READ (Figure 4H). 

Although SMAD2, with a total of 101 mutations, in- 
cluding 35 deleterious mutations, is much less densely mu- 
tated (70 mut/kb) than SMAD4 (Figure 3B), it also con- 
tains recurrently mutated hotspot AA residues in the MH2 
domain. The hotspots include two AA residues, i.e. P305 
and R321 mutated 3 and 4 times, respectively, and the non- 
sense mutation p.S464Ter truncating the protein by five 
AAs, which is the most common SMAD2 mutation (13 oc- 
currences in cancers such as COAD, STAD, and BRCA). 
As in SMAD4, the SMAD2 hotspot residues localize at the 
SMAD4:SMAD2? interaction surfaces (Figure 4B and Sup- 
plementary Figure S4). 

We separately analyzed miRNA level profiles in samples 
with hotspot missense mutations (NV = 7), the p.S464Ter 





surfaces (as indicated in the figure). Mutated positions in SMAD4 are indicated in pink, whereas SMAD2 mutations are indicated in blue. (C) The pro- 
portion of copy number alterations of SMAD4 (y-axis) in samples with different types of SMAD4 mutations. (D) Volcano plots depicting miRNA level 
alterations in samples with different types of SMAD4 mutations (indicated above the graphs) compared to samples without any SMAD4 mutations. Pos- 
itive and negative values on the x-axis indicate increased and decreased miRNA levels in mutated samples, respectively. The y-axis indicates the P-value 
on a logio scale, and the P = 0.05 threshold is indicated by a pink dashed line. The y-axis indicates the logig P-values. Blue, pink, and gray dots indicate 
Sp-miRNAs, 3p-miRNAs and miRNAs whose arm was not specified, respectively. The IDs of selected miRNAs are indicated on the graphs. (E) Boxplots 
showing the distribution of the selected miRNA levels (y-axes) in samples with different types of mutations versus samples without any SMAD¢4 mutations 
(x-axes). The P-values <0.05 are indicated on the graphs. (F) Violin plots showing the distribution of non-pan-cancer-normalized levels (y-axis) of the 
miRNAs shown in E in samples without SMAD4 mutations and specific samples with different types of SMAD4 mutations (dots of different colors). 
Horizontal lines indicate median miRNA levels in samples with different types of mutations. (G) and (H) Kaplan—Meier plots indicating the OS of patients 
without and with a specified type of SMAD4 mutation in pan-cancer and specific cancer types, respectively. 
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mutation (NV = 13), and deleterious mutations (not in- 
cluding p.S464Ter; N = 22). As in samples with SWAD4 
mutations, samples with p.S464Ter and deleterious muta- 
tions showed a substantial excess of downregulated miR- 
NAs [95% (34 of 36 altered miRNAs at P < 0.05) and 
70% (16 of 23), respectively]) (Figure 5A and Supplemen- 
tary Table S5). A similar effect was also observed in the 
group of high-confidence miR NAs (Supplementary Figure 
S5B). The excess of downregulations is not visible for the 
hotspot missense mutations; however, it must be noted that 
due to a very low number of hotspot mutations, the analy- 
sis of miRNA levels for this type of mutation is of very low 
statistical power. Nonetheless, we observed one miRNA, 
miR-329-3p, playing a role in different cancers (108-111)), 
which was consequently downregulated in all three mu- 
tational groups. Other downregulated miRNAs playing a 
role in cancer include (i) miR-1247-5p and miR-1247-3p (in 
samples with p.S464Ter), recently identified as tumor sup- 
pressor miRNAs (104), and observed as highly expressed in 
embryonic/placenta cells which may indicate their role in 
quickly dividing cells (112); (41) miR-7-5p (in samples with 
deleterious mutations), a well-recognized suppressormiR, 
playing a role in downregulation of the growth, metasta- 
sis, and prognosis of various tumors (reviewed in (113)); 
and (111) miR-380-5p (in samples with hotspot missense mu- 
tations), downregulating TPS3 to control cellular survival 
(114). As shown in Figure 5B and C, the examples of miR- 
329-3p and miR-380-5p (both members of the MIR-154 
family playing a role in pulmonary fibrosis and being a tar- 
get of TGFB signaling (115)) illustrate that miRNA level 
changes identified in the pan-cancer analysis are also re- 
flected in the most frequently mutated cancers, i.e. COAD 
and READ. SMAD2, similar to other R-SMADs, may also 
posttranscriptionally increase the level of a specific group of 
miRNAs, facilitating the processing of their pri-miRNAs by 
DROSHA (72). Among the 44 miRNAs shown to be up- 
regulated by R-SMADs or containing specific R-SMAD- 
interacting sequence motifs (23 of them tested in this study), 
miR-421 (a member of the MIR-95 family) (P = 0.01), miR- 
188-5p (P = 0.03), and miR-877-5p (P = 0.04) were down- 
regulated in samples with the p.S464Ter mutation. A com- 
parison of the SMAD2 mutations with clinical character- 
istics of cancers did not reveal an association of mutations 
with cancer stages, but p.S464Ter showed a trend toward 
decreased OS of patients with the mutations (Figure 5D) in 
pan-cancer and in the individual cancers with >1 p.S464Ter 
occurrence in informative samples. 

As shown in Figure 5E, although there is some overlap 
between miRNAs altered by the hotspot missense muta- 
tions in SMAD2 and SMAD4, generally the overlaps be- 
tween the effects of different mutation types are small. The 
small overlap, may mainly result from the low statistical 
power of the analysis and the fact that different muta- 
tions predominate in different cancer types (with different 
miRNA profiles), but it may also result from various in- 
volvement of the proteins in distinct regulatory processes, 
ie. (i) SMAD4 plays a role in the activation of miRNA 
transcription and in this process it may interact with differ- 
ent R-SSMADs (not only SMAD2; different mutations may 
differentially affect interactions with different R-SMADs) 
whereas (ii) SMAD2 (but not SMAD4) plays a role in post- 
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transcriptional regulation of miRNA precursors. Addition- 
ally, secondary regulatory processes may also play a role. 


Functional consequences of DICER] mutations 


Another highly mutated gene with several characteristic 
hotspots is DICERI, with a total of 272 mutations (46 
mut/kb), including 38 deleterious mutations and 27 mis- 
sense mutations located in the aforementioned hotspots in 
the RIIa (NV = 6) and RIIIb (VN = 21) domains (Figure 3C). 
As shown in Figure 6A, the occurrence of DICER1 muta- 
tions does not correlate with gene copy number alterations, 
and only a small fraction of the DICER/ mutations coincide 
with gene deletion or amplification. The miR NA expression 
analysis at the pan-cancer level showed that the hotspot mu- 
tations in both the RIIIa and RUIb domains were associ- 
ated with the global downregulation of 5p-miR NAs with si- 
multaneous upregulation of 3p-miR NAs (Figure 6B). This 
effect of Sp-miRNA downregulation was observed before 
and was explained by the fact that the hotspot mutations 
in RHIb but also in RIIIa (65) affect the activity of the 
RIIIb domain, preventing cleavage of miRNA precursors 
at the 5p arm and the release of 5p-miRNAs. The increase 
in 3p-miRNAs was considered an artifact, i.e. an apparent 
effect counterbalancing the global decrease in 5-miRNAs, 
resulting from the standard miRNA level normalization 
procedure that standardized the amount of each miRNA 
against the total number of miRNA reads. Surprisingly, we 
observed a similar effect of decreased Sp-miRNAs and in- 
creased 3p-miR NAs in samples with deleterious mutations 
that do not affect RIIb but are assumed to lead to com- 
plete loss of DICER1 (Figure 6 B and C; for comparison, 
please see the effect of SM/AD4 mutations in which 5p- and 
3p-miRNAs are more or less equally distributed between 
the decreased and increased miRNAs). The effect of the 
asymmetrical distribution of altered miRNAs was not ob- 
served for other nonhotspot missense mutations outside the 
RNase domains and synonymous mutations for which, as 
expected, miRNA level changes were very low (Figure 6C). 
To avoid potential biases associated with pan-cancer nor- 
malization, we performed expression analysis separately for 
UCEC, which is the cancer type with the most frequent mu- 
tations in DICER] (~9%), with 2 mutations in RIIIa, 10 
mutations in RIIIb and 9 deleterious mutations. Although 
of lower statistical power, the UCEC analysis showed a sim- 
ilar effect of the hotspot and deleterious mutations on glob- 
ally decreased levels of Sp-miRNAs and increased levels of 
3p-miR NAs (Figure 6B). To directly check whether the in- 
crease in 3p-miRNAs in samples with the DICER] muta- 
tions is an effect of normalization against the total number 
of miRNA-specific reads, we normalized the miRNA lev- 
els against the level of miR-45la used as a reference gene. 
miR-45la is a miRNA whose biogenesis is not dependent 
on DICER] processing (116); therefore, its level should not 
be affected by DICER1 mutations. As shown in Figure 6C, 
normalization against miR-45la in samples with hotspot 
mutations abolished the effects of neither 5p-miRNA de- 
creases nor 3p-miRNA increases. A consistent effect was 
observed when the analysis was performed on RPM (non- 
batch-effects normalized) miR NA levels and when the anal- 
ysis was narrowed down to the high-confidence miRNAs 


2207 Auenuqa4 0} uo ysanB Aq gegg909/109/z/6y/el0!We/4eU/W09‘dno'sIWapese//:sdyy Woy pepeojumog 


612 Nucleic Acids Research, 2021, Vol. 49, No. 2 
























































































otspot missense $464Ter leleterious (without er) 
A h i deleteri ithout S464T 
° 
miR-380 miR-7 D 
3 is _ 3 miR: Ae _ 3 
a a a ° 
= . = = 
> ° Ey 3 
> ° 7 T ° 
miR-1247 
a 2 2 
"3. , mIR-23a ae 
; ® : 
miR-424eMIR-188 © gmiR-32 (2 MIR-331 
. fee © MIR-329 ‘a miR-421°° 
ace ec nee lek see cs ceeeseaes lbeeateaaee £ 
e Sp miRNA 
e 3p miRNA 
not specified 
T T T T T T T T T T 
-0.3 -0.2 -0.1 0 0.1 0.2 0.3)  -0.3 -0.2 i ; -0.3 -0.2 -0.1 0 0.1 0.2 0.3 
difference difference difference 
COAD READ 
By. Cc 
3 
3 
2. ¥ 
2 4% ad = : 
Rs05 SoS 3 
2 2 ¥ x ff vt 3 2] ° 2 
«8 1 T a 
Ee a OS T & 
£2 9 3 
oc oO 
2s a I = e1 F 1 
o> = | . e 
£ 2 E 5 
= 05 ee 
——.—$—. 
0 : : —————— 
-1 





























































































































4 
2] 
fos : é 
rr) + 
3 505 ° ‘ ; 3 e = 
aS & E ’ a . 
es is T 2 c 
E§ 0 Pa : 1 
£a = 
gs = | ° 1 SS 
ae ae eae 
23 ‘i a 
£7.05 | 2 
= : 0 Se 0 eee 
-1 
ne ait o 7 o fy ® a os 
a bP ee 2 BE 
1 ee 
oS 2 2 Q = 
of E —€ @ 
i > 
an 
D pan-cancer COAD BRCA E all differential miRNAs 
sn — $464Ter a — 8464Ter ie — $464Ter eas oer 
@ — _ no SMAD2 mut Os = — no SMAD2 mut 08 — no SMAD2 mut deleterious hotspot missense 
= SMAD4 SMAD2 
Z 0.6 0.6 all hotspot 464ter 
a 
a 0.44 0.4 
oO 
> 
6 0.2 0.2 
0.0 0.0 
0 2500 5000 7500 10000 0 1000 2000 3000 4000 0 2000 4000 6000 8000 
timeline [days] timeline [days] timeline [days] 


Figure 5. Characteristics of SMAD2 mutations. (A) Volcano plots depicting miRNA level alterations in samples with the particular types of SMAD2 
mutations (indicated above the graphs) compared to samples without any SMAD2 mutations (graph scheme as in Figure 4D). (B) Boxplots showing the 
distribution of the selected miRNA levels (x-axes) in samples with different types of SMAD2 mutations vs. samples without SMA D2 mutations. (C) Violin 
plots showing the distribution of non-pan-cancer-normalized levels (y-axis) of the miRNAs shown in B in samples without SMAD2 mutations and specific 
samples with different types of SMAD2 mutations (graph scheme as in Figure 4F). (D) Kaplan—Meier plots showing the OS of patients with p.S464Ter 
and without any SMAD2 mutations in pan-cancer, COAD and BRCA. (E) Venn diagram showing the overlap between miRNAs altered by the indicated 
types of SMAD2 and SMAD4 mutations. 
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Figure 6. Functional consequences of DICER/ mutations. (A) The proportion of copy number alterations of D/CERI (y-axis) in samples with different 


types of DICER] mutations. (B) Volcano plots depicting miRNA level alterations in pan-cancer (first column) and U 





CEC before and after normalization 


against the miR-451a level (second and third columns, respectively) in samples with different types of DICERI/ mutations (indicated on the left). Blue, 
pink, and gray dots indicate 5p-miRNAs, 3p-miRNAs, and miRNAs whose arm was not specified, respectively. Other graph details as in Figure 4D. (C) 
Each graph shows the proportions of 3p- and Sp-miRNAs (y-axis) among the miRNAs downregulated and upregulated in samples with specified types of 
DICER! mutations (x-axis; at P < 0.05). The graphs (from the top) show the alterations in pan-cancer, and UCEC before and after normalization against 
the miR-451a level. (D) Above; a schematic representation of the analyzed isomiR fractions. Below; a box plot showing distributions of the main miRNA 
fractions of 49 3p-miRNAs highly expressed in UCEC (y-axis) in samples with the RIIIa and RIIIb mutations vs. samples with no DICER/ mutations 
(x-axis). On the right; examples of miRNAs whose main fraction was altered in samples with the DJCER/ mutations, against changes in the levels of 
upstream and downstream isomiRs (please note the different scale in the graphs). 
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(Supplementary Table SS and Supplementary Figure S5C 
and D, respectively). This indicates that hotspot mutations, 
as reported previously (60—62,65,94), decrease 5p-miRNA 
levels, but contrary to previous reports, they also increase 
the levels of numerous 3p-miRNAs. KEGG pathway en- 
richment analysis (Supplementary Table S6) showed that 
upregulated 3p-miRNAs, among others, are strongly asso- 
ciated with various cancer-related processes, such as “TGF- 
beta signaling pathway’, ‘Pathways in cancer’, ‘ErbB signal- 
ing pathway’ and ‘Hippo signaling pathway’, which were 
found among the top ten most significant associations (ad- 
justed P <0.00001). Consistent with the predicted loss-of- 
function effect of the deleterious mutations, after normal- 
ization against miR-451a, the deleterious mutations are as- 
sociated almost exclusively with a decrease in miRNA levels 
(Figure 6C). Although an excess of Sp-miR NAs is observed 
among the decreased miRNAs, there is also a substantial 
fraction (” = 52, 26%) of 3p-miRNAs. 

Finally, we checked whether the DICER/ hotspot muta- 
tions affect the 5’end heterogeneity of 3p-miRNAs and pro- 
portions of generated isomiRs. The heterogeneity of many 
miRNAs has been documented before (117-120) and it was 
shown that the proportion of isomiRs may differ substan- 
tially in different conditions, including cancer (117,121). To 
reliably estimate fractions of isomiRs, for analysis we took 
only 3p-miRNAs (” = 132) undergoing high expression 
(=>100 RPM) in UCEC. For each of the miRNAs, we com- 
pared the level of their isoforms with the most commonly 
generated 5’end (main miRNAs) versus level of minor (ei- 
ther upstream or downstream) isomiRs. As shown in Figure 
6D the mutations do not affect the global level of main miR- 
NAs, which fraction among the analyzed miRNAs ranged 
from ~0.6 to 1 with a median in all cases >0.95. However, 
in some of the individual miRNAs, the level of major miR- 
NAs is significantly altered (mostly decreased) in samples 
with the DICER/ hotspot mutations (Supplementary Table 
S8 and Figure 6D (right panel)). Although there is striking 
concordance between the effects of the mutations in RIa 
and RIIIb (Supplementary Table S8) the overall changes 
in fractions of the main miRNAs are very small. It is un- 
likely that such small changes in fractions of main miRNAs 
may cause any meaningful biological effect, however that 
may result in more substantial changes in some of the mi- 
nor isomiRs (Figure 6D (right panel)). 

The lists of miRNAs differentiated in pan-cancer and 
UCEC with different types of DICER/ mutations are 
shown in Supplementary Table S5, it includes a substan- 
tial number of the high-confidence miRNAs. Among the 
altered miRNAs are many miRNAs whose function is well 
recognized in cancer and many upregulated 3p-miRNAs are 
passengers of such miRNAs, examples of which are listed 
in Supplementary Table S7. A comparison of the DICER/ 
mutations with clinical characteristics of cancers did not re- 
veal any significant associations. 


DISCUSSION 


In this study, we provide a comprehensive pan-cancer anal- 
ysis of somatic mutagenesis in a panel of 29 miRNA bio- 
genesis genes in 33 adult-onset cancers. In total, we identi- 
fied 3,649 mutations, composed of 60% missense and 17% 


deleterious mutations. Overall, ~21% of the samples had 
at least one mutation, but the percentage differed substan- 
tially between the cancer types, reaching >40% in SKCM or 
COAD. The most frequently mutated genes were TNRC6A, 
DICERI, SMAD4 and ZCCH C11, with mutations in 2-3% 
of pan-cancer samples. The frequency of mutations in the 
miR NA biogenesis genes is similar to that in cancer-specific 
drivers, such as MYC, ALK, CACNAIA, POLE, BCL2, 
NOTCH2, MET, HRAS, FGFR and PIK3RI1, with pan- 
cancer mutation frequencies up to ~3% (122). Consistent 
with the potential cancer-specific role of the genes, some of 
them are much more frequently mutated in particular can- 
cers, for example, SMAD4 in PAAD (22%), READ (16%), 
COAD (14%), and ESCA (8%); DICERI in SKCM and 
UCEC (~9%); TNRC6A in STAD (9%) and UCEC (7%); 
ZCCHC6 in ACC (6.5%); SMAD2 in COAD and READ 
(~5%); and PRKRA in OV (4.3%). The frequencies of mu- 
tations in some of the genes are specifically and significantly 
enriched in particular cancer types. 

The high frequency of mutations in SMAD4 is consis- 
tent with the findings of many previous studies in which 
SMAD4 was analyzed as a transcription factor playing a 
role in the activation of many cancer-related genes in re- 
sponse to TGFB/BMP signaling (123-126). Comprehen- 
sive analysis of a large number of samples allowed us to 
define a profile of SMAD4 mutations in most common hu- 
man cancers covered by TCGA. It confirmed the high fre- 
quency of SMAD4 mutations in cancers such as PAAD, 
COAD and READ but also revealed a relatively high fre- 
quency of the mutations in STAD, LUAD and ESCA and 
lower frequencies in many other cancers (e.g. CESC, LUSC, 
HNC and CHOL) that are much less studied in the context 
of SMAD4 deficiency. With the large collection of muta- 
tions, we have identified 8 hotspot residues in the MH2 do- 
main, including some not reported before, and revealed that 
the proportions of hotspot and deleterious SMAD4 muta- 
tions differ substantially between cancers, with the fraction 
of hotspot mutations significantly enriched in READ (86%) 
and the lowest in PAAD (27%). This may suggest differ- 
ent mechanisms and effects of SMAD4 inactivation in dif- 
ferent cancer types. We showed that most of the hotspots 
localize in residues of charged AAs, which may affect the 
electrostatic interactions of the MH2 domain with corre- 
sponding domains in R-SMADs (e.g. SMAD2) critical for 
SMAD complex formation (86). Such an effect of both loss- 
of-charge and gain-of-charge SMAD4 mutations was com- 
putationally predicted in a previous study (127). Consis- 
tently, we showed that all the SMA D4 hotspots as well as 
much less frequently mutated hotspot residues in the cor- 
responding MH2 domain in SMAD2 localize at surfaces 
of the SMAD4:SMAD2?2 interaction, confirming the role 
of hotspot mutations in destabilizing the SMAD complex 
and hence revealing an inhibitory role of these mutations in 
downstream TGFB/BMP signaling. We showed that both 
the deleterious and hotspot SMAD4 mutations frequently 
(~80%) coincide with gene deletions, which indicates inac- 
tivation of the second allele and is consistent with previous 
observations of frequent LOH in the region (128-131). Al- 
though it was shown that SMAD4 plays a role in the activa- 
tion of miRNA gene transcription (summarized in (132)), 
the effect of SMAD4 mutations on miRNA expression 
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has never been tested. We identified numerous miRNAs 
(predominantly high-confidence miR NAs) differentially ex- 
pressed in samples with SMAD4 mutations and showed dif- 
ferent patterns of miRNA level changes induced by hotspot 
and deleterious mutations. While hotspot mutations pre- 
dominantly downregulate miRNAs, deleterious mutations 
both increase and decrease miRNA levels. This finding is 
consistent with the results obtained for SMAD4 knock- 
down by RNAi in hepatic stellate cells (73) and again shows 
a different effect of deleterious mutations that, as a result 
of NMD, most likely leads to complete loss of SMAD4 
and hotspot mutations that modify SMAD4 structure, af- 
fecting its interactions with R-SMADs and other proteins. 
Therefore, the effect of hotspot mutations may result not 
only from impeded R-SSMADs:SMAD4 complex formation 
but also from a shifted balance in the binding of competing 
coactivators and corepressors (such as p300/CBP and Ski 
or SnoN, respectively), whose contribution determines the 
outcome of signaling events (reviewed in (125,133)). 

Similar effects of predominant downregulation of 
miRNA levels were observed for p.S464Ter, the most fre- 
quent SMA D2 hotspot mutation, as well as for deleterious 
SMAD2 mutations. The p.S464Ter mutation is localized 
in the last exon of SMAD2 and truncates the last 5 AAs 
of the protein but most likely does not activate NMD. As 
the truncated fragment is important for efficient SMAD2 
phosphorylation and includes two serine residues (S465 
and S467) whose phosphorylation upon TGFB signaling 
is critical for SMAD4(SMAD2), complex formation 
(93,134,135), the mutation precludes the SMAD complex 
formation and TGFB/BMP signaling (123). We showed 
that the mutation is associated with decreased survival 
of cancer patients. Of note, two BRCA patients with the 
mutations showed a strikingly short OS. Regardless of 
its role in complexes with SMAD4, SMAD2 (along with 
other R-SMADs) may directly interact with a specific set 
of miRNA precursors, accelerating their processing by 
DROSHA (72,136). This posttranscriptional regulation 
of miRNA processing may also be affected by SMAD2 
mutations, as the MH2 domain of R-SMADs was shown 
to interact with the P68 RNA helicase participating in the 
recruitment of DROSHA and DGCR8 to pri-miRNAs 
(137). 

Another gene with characteristic hotspot mutations is 
DICERI. We found 21 hotspot mutations affecting 4 metal- 
ion-binding AA residues in the RHIb domain (E1705, 
D1709, D1810 and E1813) and 6 hotspot mutations in one 
AA residue ($1344) in RIIa. These hotspots were previ- 
ously detected and functionally characterized in various pe- 
diatric cancers, including cancers associated with DICER1 
syndrome (e.g. pleuropulmonary blastoma) and Wilms’ tu- 
mor, as summarized in (54,64). More recently, they were 
also investigated in thyroid adenomas (94) and the TCGA 
cohort, mostly in UCEC samples (65). It was shown that 
RIIIb hotspot mutations affect the RIIb cleavage of the 
5p-arm of pre-miRNAs, resulting in inefficient production 
of Sp-miR NAs (60-62,94). In a very recent study, it was also 
shown that the hotspot mutations of the $1344 residue, al- 
though located in RIJ]a, spatially interfere with RIIIb, re- 
sulting in the same effect as that observed for the muta- 
tions in RIIIb (65). The miRNA profiling performed in our 
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study showed a global decrease in 5p-miRNAs and a global 
increase in 3p-miRNAs in samples with both hotspot and 
deleterious DJCER/ mutations. A similar effect has been 
observed before (61,62,65,94, 138,139). The global decrease 
in 5Sp-miRNAs has been interpreted as the result of ineffi- 
cient 5-miRNA processing, but the increase in 3p-miRNAs 
was considered an artifact of miRNA level normalization 
(against the total number of miRNA reads) that, as a re- 
flection of the global Sp-miRNA deficit, resulted in a rel- 
ative increase in unaffected 3p-miRNAs (65). To eliminate 
this potential bias, we normalized the miRNA levels against 
the level of miR-45la, which is a DICER1-independent 
miRNA whose level is not affected by DICER1 deficiency 
(116). However, normalization against the miR-451a level 
did not eliminate the asymmetrical effect of the hotspot mu- 
tations also visible in the high-confidence miRNAs, show- 
ing that both the Sp-miRNA decreases and 3p-miR NA in- 
creases are real. We have additionally shown that the in- 
creased 3p-miRNAs strongly associate with cancer-related 
terms/pathways and include many miRNAs well recog- 
nized in cancer (Supplementary Table S6). The increase in 
3p-miRNAs may result from a lack of competition with 5p- 
miRNAs during transfer to and loading onto the RISC, ob- 
served as 5p/3p arm shifting or switching (140). However, 
this would require some alternative mechanism of releas- 
ing 3p-miRNAs from partially processed (nicked only at 
3p-arms) pre-miRNAs and transferring them to the RISC, 
bypassing the miRNA duplex stage. Although it was pre- 
viously speculated that AGO2 may play some role in such 
a process (19,20), in our opinion, this step warrants fur- 
ther investigation. Unlike the hotspot mutations, the dele- 
terious mutations (after normalization against miR-451a) 
almost exclusively decrease miRNA levels, which confirms 
their loss-of-function nature. Although the effect of delete- 
rious mutations is more profound for 5p-miRNAs, the mu- 
tations also affect 3p-miRNAs (~30%). 

With exception of hotspot mutations found in DICER], 
we found only a few mutations in other miRNA biogenesis 
genes, i.e. DROSHA, DGCRS and XPOS, which have been 
recently observed in different childhood cancers associated 
with DICER1 syndrome and in Wilms’ tumor (56—59,66). 
This may indicate specific functions of the mutations/genes 
characteristic of childhood cancers but not playing a role 
in adult-onset cancers. We also excluded the frequent oc- 
currence of specific indel hotspots in TRBP and XPOS pre- 
viously reported at high frequencies in cancers associated 
with MSI (67,68). 

In summary, in this study, we present a comprehensive 
pan-cancer analysis of somatic mutations accumulated in 
genes involved in miRNA biogenesis and function. We 
showed that some of these genes are specifically mutated 
in particular cancers. We identified many hotspot muta- 
tions, including some recurring in specific cancer types. We 
extended knowledge about the types and distribution of 
SMAD4 mutations and showed their effect on the expres- 
sion of miRNA genes. We also showed that all hotspot mu- 
tations in SMAD4 and SMAD2 affect AA residues located 
at the surface of the SMAD4:SMAD2 interaction. More- 
over, we distinguished and further characterized the effects 
of deleterious and hotspot missense mutations in DICERI, 
among others, showing that hotspot mutations in the RI- 
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IIa and RIIb domains not only decrease the levels of S5p- 
miRNAs but also increase the levels of 3p-miRNAs. As a 
result, we have identified numerous miRNAs that are sig- 
nificantly increased or decreased in samples with particular 
mutation types, including many well-known cancer-related 
miRNAs. We also linked some of the mutations with the 
patients’ clinical outcomes. Furthermore, we created a com- 
pendium of information presented as an atlas and maps of 
mutations in miRNA biogenesis genes that may be useful 
resources of information for studying a particular gene or 
cancer type. 
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